Experiment B - Combined CD45+/-

Separate Fat & Tumor Cell Groups for LigXRec Analysis

Version 1.0 (Date: 7/19/21)

Intro and Objective

Evidence exists from cancer tissue studies showing cellular interactions between tumor and peritumoral fat tissues. These include tumor tissues using fat reserves in adipocytes as sources of energy in areas of increased hypoxia and oxidative stress and immunosuppression in the tumor microenvironment due to genetic deregulations leading to shifts in necessary intercellular functions such as chemokine production. While new treatments such as immune checkpoint inhibitors have made great strides in reducing the burden of many cancers, the interplay between stromal tissues and tumors must not go unnoted as there have been repeated correlations between tumor microenvironment factors and immune checkpoint treatment efficacy.

This study aims to elucidate the cellular and genetic interactions between tumor and peritumoral fat tissue in a murine B16 melanoma model when treated with anti-PD-1 immune checkpoint inhibitor treatment. The below data analysis examines the scRNA-seq data of ~21K filtered genes in ~88K cells taken from either isotype- or anti-PD-1 treated B6 mice. scRNA-seq data was acquired using CellRanger. This single-cell data was then analyzed using the Scanpy toolkit (Theis et al., 2018, ver. 1.6.0, doi: 10.1186/s13059-017-1382-0) for subsequent cell group clustering and mapping to be used for examining cellular and genetic interactions with and without anti-PD-1 treatment.

Primary Analyst: Jacob Harris (jacob.harris@novartis.com)

Co-Analyst, In vivo Data Acquisition: Bob Haines (bob.haines@novartis.com)

Packages

Data Preprocessing

Preprocessed data of scRNA-seq can be found in L-Drive folder L://usca-dfs/LABDATA/LABS/IO/USCA-VIRTUAL-I22153/CIPOLDA3/2021/JacobH/Bob_postdoc/ExpB_v1.0_final

Throughout our analysis, we relied on clustering algorithms in order to group similar cells for the purposes of visualizations and cell type identification. Scanpy uses an algorithm known as the Leiden algorithm, one that is more recent than the more well-known Louvain algorithm which has shown improved community recognition and speed (Traag et al., 2019, doi: 10.1038/s41598-019-41695-z).

Normal anndata is stored as a numpy array matrix (adata.X) with rows corresponding to single cells (adata.obs) and columns to gene expression levels (adata.var) which can be frozen as an adata.raw array with normalized values to be used for differential testing and visualizations. Normally, this is done when tagging and filtering out any genes which are deemed "not highly variable" in order to save on computation time for analysis and focus in on a select group set of genes. However, we wanted to use the raw, unnormalized adata as our dataset to hopefully gain insight into all of the genes present in our cells.

However, as the normal Scanpy workflow usually involves working with a filtered and normalized adata set, we had to first convert our adata.raw set to a Scipy sparse matrix (adata.raw.X) and then convert that to a numpy array, adata.X.toarray(), which could be saved to the adata.X to convert the raw adata to a form which could be used for computations going forward.

Cluster Cell Type Identification

To get an idea of what types of cells were in each of the identified Leiden clusters, we looked at genetic expression levels of characteristic genes that would help distinguish between Cd45- stromal cells (tumor, endothelial, and fibroblasts) and Cd45+ immune cells. We did much of this using visualization tools, namely matrixplots, to name Leiden cluster numbers based on predicted cell types and optional biomarker genetic tags and/or tissue location (F for fat or T for tumor).

Note: All figures are automatically saved to the 'figures' folder.

We then assembled a list of all of the predicted cell types and assigned each name to each corresponding Leiden cluster number in an adata.obs column 'cluster_name' for each cell. Future users are encouraged to create their own cluster naming conventions based on the nature of their experiment.

Specify Tumor/Fat adata Sets

At this point we separated our full adata set into subsets made up of those cells found in specified locations so that we could compare their genetic and cellular characteristics going forward. These include:

LigXRec Interaction Analysis

The following algorithm allowed us to compare the full scope of genetic and cellular interactions between every cluster based on isotype versus anti-PD-1 treatments and the separated tissue adata subsets. A list of ~2300 curated ligand-receptor gene pairs by Hou et al. (2020, doi: 10.1038/s41467-020-18873-z) serves as a reference for gene pairs within all adata subsets. The genetic and cellular characteristics of each cluster is specified, including the number of genes and cells per cluster along with the means of genetic expression for each gene. The mean genetic expression levels of corresponding ligands and receptors are multiplied to give a product, here deemed LigXRec, to quantify the level of interactions between clusters. Due to these values differing based on treatment, we use these LigXRec values going forward as quantifiable indicators of change in cell-cell interaction within and between our tissue subsets.

At the top of the algorithm is a 'Constants' heading where users should input strings for treatment and adata tissue subset as notated by the comments. For the purposes of this experiment, the entire algorithm is encapsulated within a FOR loop that allows for the comparison of every possible cluster interaction. If users are interested in the interaction between a specific cell cluster and the rest of the adata set, then they should edit either the iCellType (ligand) or jCellType (receptor) variables with the cluster_name variable in question under code chunk 9 inside the 'subCompCellLigRecPairs' function. These cluster variables are notated with comments. However, if users are only interested in the interaction between two specific clusters, then the top FOR loop can be deleted entirely while the iCellType ligand variable and jCellType receptor variable can be altered in code chunk 9 as needed.

Primary LigXRec Interaction Algorithm and pygenomicstools.py Author: Joseph Zhou (joseph.zhou@novartis.com)

NOTE: Make sure to have file 'pygenomicstools.py' in present wd. This process is computationally intensive and requires many hours. If running for all clusters and both treatments, set aside 2-3 days in order to fully complete calculations on a typical local workstation. For comparison, these results were generated on a Macbook Pro with 16GB RAM and a 2.9 GHz quad-core CPU.

LigXRec .csv Files and Pandas DFs

The LigXRec interaction results from every cluster for both treatments are separated into dataframes in which they act as the ligand or receptor in an interaction in the specified treatment. These dataframes are combined to make one table containing every interaction a cluster shows in each treatment. These tables are then merged based on like interactions to give differences in LigXRec values (aPD1 - Iso) between clusters between each ligand-receptor gene pair. All clusters' LigXRec information dataframes are then merged to give an overview of each tissue subtype's LigXRec interactions.

Calculate Raw Gene Scores for Each Cluster in Iso/PD1 Treatments

In order to compare the levels of ligand and receptor gene expression levels based on iso/aPD-1 treatment while accounting for different numbers of cells in each cluster, we created DFs for each cluster which contained cells unique to each cluster along with the raw expression levels of every gene in the adata set. The cluster expression levels of each gene are then averaged to show the overall relative magnitude of ligand and receptor genetic expression in each cluster depending on treatment. This information can be used when examining the importance of LigXRec interaction results as users should be aware that high LigXRec values may not necessarily indicate large numbers of interactions between clusters. For example, relatively low ligand or receptor score values are indicators of low genetic expression, but a high corresponding LigXRec value can indicate notable levels of ligand-receptor bonding when interactions do occur.

Compare LigXRec Scores between (Un)Filtered Iso/PD1 Treatments

When examining the resulting ligand and receptor gene scores for each treatment, we then looked to construct DFs combining every cell-cell and gene-gene pair along with their corresponding isotype and anti-PD-1 LigXRec values in order to evaluate the difference that immunotherapy could have on these interactions in each tissue subset. To try and cut down on the number of interactions, we made a filter which only allowed for tables of ligand and receptor gene scores that are greater than 1 as we felt that these are genes which indicate expression levels notable enough to be seriously considered for the purposes of this analysis. However, we also performed the same analysis on the full, unfiltered dataset for comparison purposes. The greatest differences in LigXRec values between anti-PD-1 and isotype gene and cell pairs were then organized into tables and plotted on heatmaps for visual analysis.

Compare LigXRec Values of Exp. B adata Tissue Subsets

Each tissue subset (FatT, FatC, and OnlyT) was compared to each other at the genetic interaction level to distinguish each set. This includes determining gene pairs that are unique to each set and comparing LigXRec values within each tissue set by comparing gene pairs found in both sets. Calculations of the differences between these values allows for comparisons of the magnitude of change in ligand-receptor interaction due to anti-PD-1 treatment in each tissue.

Plotly